WHO cardiovascular disease risk prediction model performance in 10 regions, China

Abstract Objective To validate the World Health Organization (WHO) non-laboratory-based cardiovascular disease risk prediction model in regions of China. Methods We performed an external validation of the WHO model for East Asia using the data set of China Kadoorie Biobank, an ongoing cohort study with 512 725 participants recruited from 10 regions of China from 2004–2008. We also recalculated the recalibration parameters for the WHO model in each region and evaluated the predictive performance of the model before and after recalibration. We assessed discrimination performance by Harrell’s C index. Findings We included 412 225 participants aged 40–79 years. During a median follow-up of 11 years, 58 035 and 41 262 incident cardiovascular disease cases were recorded in women and men, respectively. Harrell's C of the WHO model was 0.682 in women and 0.700 in men but varied among regions. The WHO model underestimated the 10-year cardiovascular disease risk in most regions. After recalibration in each region, discrimination and calibration were both improved in the overall population. Harrell’s C increased from 0.674 to 0.749 in women and from 0.698 to 0.753 in men. The ratios of predicted to observed cases before and after recalibration were 0.189 and 1.027 in women and 0.543 and 1.089 in men. Conclusion The WHO model for East Asia yielded moderate discrimination for cardiovascular disease in the Chinese population and had limited prediction for cardiovascular disease risk in different regions in China. Recalibration for diverse regions greatly improved discrimination and calibration in the overall population.


Introduction
Cardiovascular diseases, including coronary artery disease and stroke, are the leading causes of death and disability worldwide. 1 Risk prediction models are important tools for identifying highrisk individuals who can benefit from early primary prevention of cardiovascular disease. [2][3][4][5][6] Some terminology related to risk prediction models is explained in Box 1. 7,11 The World Health Organization (WHO) has developed new models to estimate cardiovascular disease risk for people aged 40-80 years in 21 Global Burden of Disease regions, including laboratory-based and non-laboratory-based models. 12 The nonlaboratory-based risk model (hereafter called the WHO model) is more applicable in lower-resource regions where blood-based biomarkers, such as lipid levels, are not widely available for all individuals. The WHO model for East Asia was recommended for predicting individuals' cardiovascular disease risk in China. However, the model does not take into account important differences in the geographical patterns of incidence, prevalence and mortality of cardiovascular disease (overall and the main subtypes) or the prevalence of the major contributing risk factors for the disease in China. [13][14][15] A previous study conducted an external validation of the WHO model for East Asia among 29 337 participants from 16 provinces of China. The researchers found that the model overestimated the observed cardiovascular disease risk in China; however, the predictive performance of the model in different regions of China was not evaluated. 16 In the current study we aimed to validate and recalibrate the WHO model for East Asia in a different population of China, using the data set from the China Kadoorie Biobank study which covers 10 diverse regions. We examined regional differences in the incidence of cardiovascular disease in China by comparing the performance of the WHO model in predicting coronary artery disease and stroke in the study population, before and after separate recalibration in the 10 regions.

Study population
China Kadoorie Biobank is an ongoing prospective study with 512 725 participants aged 30-79 years, enrolled from 10 diverse regions of China (five urban, five rural) starting in 2004-2008. Details of the study have been described elsewhere 17,18 and in the authors' online repository. 19 Briefly, the baseline questionnaire collected information on participants' sociodemographic characteristics, lifestyle behaviours, dietary habits, personal health (including self-reported histories of coronary artery disease, stroke and transient ischaemic attacks) and family medical history. A 10 mL random blood sample was collected for each participant, with the time of the last meal recorded.
For the present analysis, conducted in May 2022, we excluded participants who were younger than 40 years old (77 623 people), those with a history of coronary artery disease (15 286 people) or stroke or transient ischaemic attack (7590 people), and those who had missing data on body mass index (one person) at the baseline survey. We therefore included a total of 412 225 participants (Fig. 1).
We obtained ethical approval from the ethical review committee of the Chinese Center for Disease Control and Prevention in Beijing, China, and the Oxford tropical research ethics committee, University of Oxford, United Kingdom of Great Britain and Northern Ireland. All participants provided a written informed consent form.

Data collection
The variables we used included sex, age, smoking status, systolic blood pressure and body mass index, all of which are risk predictors in the WHO non-laboratorybased cardiovascular disease model. 12 Details on the collection and definition of each predictor have been described in our previous study. 20 We followed up all participants to determine any cardiovascular disease events experienced since their baseline enrolment. These incident events were identified from local disease and death registries and the national health insurance system, or by directly contacting the participants. 17 A total of 500 029 (97.5%) participants were linked to the Chinese health insurance system. Only 4009 participants (1.0%) were lost to follow-up before the date of the end of follow-up (31 December 2017). Since the period from the baseline survey to the date of loss to follow-up could still be included in analyses, we did not exclude these participants. We used information from records of underlying and multiple causes of death and from primary and secondary diagnoses at discharge from hospital. Trained staff of the China Kadoorie Biobank research team (who were blinded to the baseline information) coded all cardiovascular events using the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10). The medical records of the first event were retrieved and reviewed by specialist physician adjudicators. 21 By October 2018, of the retrieved medical records of 33 515 coronary artery disease cases (ICD-10 codes: I20-I25), 34 758 ischaemic stroke cases (code: I63), and 5023 haemorrhagic stroke cases (codes: I60-I61), the number of confirmed cases was 29 448 (87.9%), 31 806 (91.5%), and 4041 (80.4%), respectively.
We included only the first cardiovascular disease event during follow-up, unless otherwise specified. For example, if a participant was recorded with both Box 1. Terminology for the study of the cardiovascular risk prediction model External validation: Before a model is widely used, the predictive performance of the model usually needs to be estimated in a population other than the one from which the model was developed; a process called external validation. 7 Predictive performance: The accuracy of the predictions made by a model are expressed in terms of discrimination or calibration. 7 Discrimination: Discrimination performance indicates the ability of the model to distinguish between people who did and did not develop the disease of interest, which is usually assessed by Harrell's C index. 7 Harrell' s C index: This index estimates the probability of the model correctly predicting who will have a cardiovascular event first in a randomly selected pair of participants. 8 The value of this index is between 0.5 and 1.0. Generally, a C index above 0.7 indicates a good prediction model. Calibration: Calibration performance indicates the agreement between observed risks and risks predicted by the model, which is usually assessed by the calibration plot. 7 Calibration plot: The mean predicted risks at 10 years with the observed risks across deciles of predicted risks were plotted as a scatter plot. If the observed risks and mean predicted risks agree over the whole range of probabilities, the plot shows a 45-degree line (that is, the slope is 1.0), which indicates ideal calibration performance. Ratios of predicted to observed cases: A ratio of 1.0 indicates perfect calibration; ratios greater than 1.0 indicate overestimation, and those less than 1.0 indicate underestimation. 9 Nam-D'Agostino test: This is a statistical test for quantitative measurement of calibration performance, whereby a smaller χ 2 value represents a better calibration performance. 10 Recalibration: When the calibration performance is not good, the model usually needs to be adjusted to the target population (recalibration) to improve its usefulness. 11  coronary artery disease and stroke (simultaneously or sequentially), we used the date of the first of these two events in the analysis of all types of cardiovascular disease. When coronary artery disease or stroke was analysed as different outcomes, we considered the dates of the first coronary artery disease event and first stroke event separately.

Outcome definitions
The developers of the WHO model recalibrated the original model in 21 Global Burden of Disease regions to adapt the model to the circumstances of different regions. Different data sets were used when developing and recalibrating the model. The definitions of cardiovascular disease outcomes (defined using ICD-10 codes) in the process of deriving the model were narrower than those in the process of recalibrating it. 1, 12 We used both definitions in the present study. 19 Briefly, in the first component of the study (Fig. 2

Statistical analysis
We conducted all analyses separately for women and men. Briefly, we first calculated the 10-year risk of cardiovascular disease for each participant according to the WHO uncalibrated model. 12 Subsequently, in the first component of the study, we recalibrated the calculated risks according to the latest recalibration parameters in 2017 applicable to East Asia (WHO model for East Asia). 12 In the second component of the study, we recalibrated the calculated risks in the 10 study regions of China (WHO model for each region). More details are in the online repository. 19,22 We evaluated the discrimination and calibration performance of the WHO model before and after recalibration in each study region and across the overall study population. We assessed discrimination performance by Harrell's C index. 8 We assessed calibration performance graphically by comparing the mean predicted risks at 10 years with the observed risks across deciles of predicted risks in the calibration plot. The observed 10-year risks were estimated using the Kaplan-Meier method. 23 We calculated the ratios of predicted to observed cases. 9 We used the Nam-D' Agostino test to quantify the agreement or fit (Box 1). 10 We conducted the following sensitivity analyses. First, because the China Kadoorie Biobank cohort was started in 2004-2008, we used recalibration parameters in 2005, 2010, and 2015 that were derived by the developers of the WHO model to recalibrate the WHO model. 19 Second, due to the higher incidence of stroke in China compared with high-income countries, applying the outcome definition used in the derivation process of the WHO model could lead to an artificially low proportion of coronary artery disease in total cardiovascular disease. Therefore, in the second component of this study, we instead ad-opted the outcome definition of the China Kadoorie Biobank model, which is a previously developed non-laboratory-based risk prediction model and has a broader definition of coronary artery disease and a narrower definition of stroke than the WHO model. 20 More details are in the online repository. 19 The study adhered to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement for reporting. 19,24 We conducted analyses using Stata, version 17.0 (Stata Corp., College Station, United States of America). The figures were produced using R, version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria).

Study population
A total of 412 225 participants from 10 different regions of China were included in the current study: 241 556 (58.6%) were female and 230 802 (56.0%) were rural residents. The mean age was 54.3 years (standard deviation: 9.2; Table 1). There were substantial differences in the levels of cardiovascular disease risk factors among the 10 study regions. For example, the age-adjusted proportion of daily smokers in Meilan was 0.2% among women and 39.2% among men, while the corresponding proportions in Pengzhou were 10.2% and 64.7%, respectively (full data are in the online repository). 19

Cardiovascular cases
During a median follow-up of 11 years, we recorded 58 035 and 41 262 new cardiovascular disease cases (defined

Fig. 2. Study design in the validation of the WHO non-laboratory-based cardiovascular disease risk prediction model in 10 regions of China
The WHO risk prediction model that has not been recalibrated according to WHO recalibration criteria) among women and men, respectively. The number of coronary artery disease cases according to the definition in the WHO model derivation process (nonfatal, ICD-10 codes: I21-I23 and fatal, codes: I21-I25) was far fewer than that according to the definition in the China Kadoorie Biobank model derivation process (any code: I20-I25; Table 1).
There were large variations in the 10-year risk of cardiovascular disease overall, and its subtypes, among the 10 study regions. For example, the 10-year risk of stroke according to the definition in the WHO model derivation process (any ICD-10 code: I60-I69) was over 30% in both sexes in Nangang, but lower than 10% in Wuzhong. 19

Validation of model
In the external validation of the WHO model for East Asia, Harrell's C index was 0.682 (95% confidence interval, CI: 0.655-0.710) among women, with substantial variation among regions, indicating moderate discrimination performance (Fig. 3). The C index was lowest in Nangang (0.642; 95% CI: 0.637-0.647) and highest in Wuzhong (0.763; 95% CI: 0.753-0.772). C indices among men were similar to those among women. As for the calibration performance, the WHO model for East Asia underestimated the 10-year risk of cardiovascular disease for the overall population and all study regions except Wuzhong and Tongxiang. 19 After recalibrating the WHO model with recalibration parameters derived in different years, the discrimination performance barely changed, 19 and the underestimation of the 10-year risk of cardiovascular disease persisted. 19

Recalibration of model
After recalibrating the WHO model in each study region, we found that the discrimination performance of the WHO model was improved in the overall study population. Among women, Harrell's C index increased by 0.075 from 0.674 (95% CI: 0.672-0.677) to 0.749 (95% CI: 0.746-0.751). Among men, the C index increased from 0.698 (95% CI: 0.695-0.701) to 0.753 (95% CI: 0.750-0.755), an increment of 0.055. However, recalibration had little effect on the discrimination performance in each study region. For example, among men in Wuzhong, where the change in C index after recalibration was the largest, the C index increased from 0.740 (95% CI: 0.728-0.752) to 0.747 (95% CI: 0.735-0.759), an increment of 0.007 (Table 2). The calibration performance of the recalibrated WHO model was close to 1.0, the ideal level, in the overall study population (Fig. 4) and in the 10 study regions. 19 After recalibration, the discrimination performance of the WHO model was improved in older people (≥ 65 years) and individuals with hypertension, diabetes, low education level and low household income. 19 The recalibrated WHO model was well calibrated in older people and those with hypertension, but slightly underestimated the risk of cardiovascular disease in people with diabetes. 19 When we instead adopted the disease outcome definition of the China Kadoorie Biobank model, the discrimination and calibration performance was improved in the overall study population. 19 However, the recalibrated model still slightly underestimated the 10-year risk of cardiovascular disease in participants with diabetes. 19

Discussion
We found that the overall discrimination of the WHO model was moderate, and the 10-year cardiovascular disease risk of the China Kadoorie Biobank study participants was underestimated in most regions. After recalibration of the WHO model in each study region, the discrimination and calibration performances of the model were greatly improved in the overall study population.
The pooled Harrell's C index of the WHO model for East Asia was only about 0.7 in both sexes, which is lower than in previous studies conducted in Chinese populations. In an external validation study based on the Asia Pacific Cohorts Studies Collaboration and the China Multi-Provincial Cohort Study, the pooled C index of the nonlaboratory-based WHO model for East Asia was 0.741 (95% CI: 0.725-0.757). 12 When applying the WHO model in the Prediction for Atherosclerotic Cardiovascular Disease Risk in China cohort, the C index was 0.754 (95% CI: 0.731-0.777) in women and 0.762 (95% CI: 0.744-0.781) in men. 16 Differences in the definition of outcomes and in the study population could have influenced the discrimination performance. In the present study, we adopted the same definition of disease outcomes that was used in the recalibration process of the WHO model. This definition includes non-fatal angina (ICD-10 code: I20) for classifying coronary artery disease, and other cerebrovascular diseases (code: I65-I69) for classifying stroke. 1 Our definition is broader than that adopted by the previous studies in China. 12,16 These differences could partly explain the overestimation of cardiovascular disease risk in the external validation of the Prediction for Atherosclerotic Cardiovascular Disease Risk in China project. 16 The WHO model for East Asia underestimated the cardiovascular disease risk to a variable extent in most study regions. Separate recalibration of the WHO model in each region achieved almost ideal calibration performance. In other words, the observed risks and risks predicted by the model were similar. The findings suggest a universal model is unsuitable for direct application to different regions in China due to large regional differences in the incidence of cardiovascular disease subtypes. Models need to be recalibrated according to the local prevalence of cardiovascular disease risk factors and disease incidence rates before being applied to a specific region. The Prediction for Atherosclerotic Cardiovascular Disease Risk in China study did not evaluate the calibration performance of the WHO model by region, thus making the conclusions less reliable than in the current study. 16 However, in our study, participants of each study region came from a relatively small geographical area in China. It is not feasible to update the model across  We defined cardiovascular disease according to the definition used by the recalibration process of the WHO non-laboratory cardiovascular disease risk model. The recalibration parameters applied to China in 2017 were used for the recalibration of the WHO model. Harrell's C index estimates the probability of the model correctly predicting who will have a cardiovascular disease event first in a randomly selected pair of participants. The value of this index is between 0.5 and 1.0. Generally, a C index above 0.7 indicates a good prediction model. We calculated the C index of the total row based on the whole sample, ignoring the region. The vertical line represents the combined C index, which is the sum of the C index of each study region weighted by inverse variance (meta-analysis). We calculated the 95% CI of the combined C index by using the t distribution with nine degrees of freedom. More details of the methods are in the authors' online repository. 19 Research Cardiovascular disease risk, China Songchun Yang et al.
the whole country according to the current regional size. A possible approach is first to update the model in a larger area, such as a province, and then to update the model in smaller geographical areas. External validation studies conducted in other regions of China are needed to examine our findings.
Recalibration significantly improved the discrimination of the WHO model in the overall population, highlighting the importance of recalibration in different regions of China. Recalibration by region is equivalent to adding the region as a predictor. Due to the significant differences in the spatial patterns of incidence of cardiovascular disease and the prevalence of major cardiovascular disease risk factors in China, [13][14][15] we observed a significant improvement in discrimination in the overall population. However, recalibration had little impact on discrimination in each study region. Since recalibration changed the predicted risk but not the order of predicted risk for each participant, 25 both the discrimination of the coronary artery disease and the stroke submodels remained unchanged in each study region. Therefore, the discrimination performance of the total cardiovascular disease model was not greatly affected.
The differences in discrimination performance of the WHO model among the 10 study regions persisted after recalibration and could not be explained by the spatial patterns mentioned above. China is a large, rapidly developing upper-middle-income country, where cardiovascular disease risk factors might differ from the well-established risk factors in high-income countries. For example, risk factors such as environmental hazards in the home, work and broader outdoor environment might also influence the incidence of cardiovascular disease in China. These risk factors were not included in the current model and might affect the discrimination of the model to varying degrees in different regions. The discrimination of the WHO model was not good (C index < 0.7) in some study regions, such as Huixian, Liuyang and Nangang. There may be specific risk factors in these regions that need to be determined. However, other known cardiovascular disease risk factors might help improve risk prediction. Specifically, the current model could be used to screen a subgroup of people with a relatively high risk of cardiovascular disease; subsequently, other Table 2 Since the outcome definition here is different from that in Fig. 3, there are some differences between the C indices before recalibration and those in Fig. 3. b The combined C index was the sum of the C index of each study region weighted by inverse variance (meta-analysis). We calculated the 95% CI of the combined C index by using the t distribution with nine degrees of freedom. Notes: We defined cardiovascular disease according to the definition used by the derivation process of the WHO non-laboratory cardiovascular disease risk model. We recalibrated the WHO model in each study region. More details of the methods are in the authors' online repository. 19 Harrell's C index estimates the probability of the model correctly predicting who will have a cardiovascular disease event first in a randomly selected pair of participants. The value of this index is between 0.5 and 1.0. Generally, a C index above 0.7 indicates a good prediction model.  20 Other physical examination indicators such as ankle-brachial index and arterial stiffness, psychosocial and work stress, and environmental exposure would also be expected to improve risk prediction. 3,4 However, these indicators are not easily available and measurable, limiting their possible application. We adopted different definitions of cardiovascular disease outcomes in the recalibration process of the WHO model, but the calibration performance of the recalibrated WHO model approached the ideal level regardless of the definition used. These findings suggest that outcome definitions adopted by the model in the practical application could have differed from those used in the model derivation process, indicating the flexibility of the recalibration method proposed by developers of the WHO model. 12 The main factor affecting the calibration performance of the recalibrated model is more likely to be the reliability of the data source used to generate the recalibration parameters. However, different outcome definitions affect the interpretation of the model. The ratio of the incidence of stroke to incidence of coronary artery disease is higher in China than in highincome countries. 12 The WHO model adopted a narrower definition of coronary artery disease and a broader definition of stroke than the China Kadoorie Biobank definition. When adopting the outcome definition in the derivation data set of the WHO model, the model mainly predicts the risk of stroke in the present population. In addition, major coronary events -the definition used in the WHO model derivation process -are well defined and measured consistently across studies. However, this narrower definition might underestimate the overall coronary artery disease burden. Currently, there is no consensus on the definition of the disease outcomes for use in cardiovascular disease risk prediction models. Different definitions of outcome have different implications in different contexts: public health, health economics or society in general. Future studies need to determine the most appropriate outcome definition for the context.
The advantages of the present study are that it provided a large external validation study of the WHO model in the Chinese population, with good coverage of regions with different burdens of cardiovascular disease subtypes. All 10 study regions adopted identical procedures and standardized protocols, allowing comparison and pooling of results from the different regions. Less than 1% of China Kadoorie Biobank participants were lost after an average of 11 years of follow-up. There were some limitations to the study, however. First, we were unable to validate the laboratory-based WHO model, since information on blood lipid levels was only available for a subset of participants. Previous studies have suggested the laboratory-based and non-laboratory-based WHO models have similar predictive performances. 12,16,27 However, the model which excludes laboratory biomarkers is more likely to be used in lower-resource regions. Second, the recalibrated WHO model for each study region should be considered a new model. External validation studies are warranted before the model is applied. Third, like most large-scale cohorts, the participants recruited at baseline were volunteers willing to participate in the study. However, the selection bias caused by the loss of follow-up is very small in the China Kadoorie Biobank study. Fourth, the current analyses included only inpatient events, which mainly correspond to more severe conditions. Participants with low socioeconomic status might have delayed hospital visits, which could narrow the difference in hospital visits among groups with different socioeconomic statuses. However, our recalibration significantly improved the discrimination among participants with low socioeconomic status. 19 Based on a large population-based cohort of Chinese adults, we found that the WHO cardiovascular risk prediction model for East Asia, using non-laboratory-based parameters, was not directly applicable to different regions of China. The model needs to be recalibrated before being used in a specific region in China. In future, to generate parameters for model recalibration, surveillance systems for cardiovascular diseases and risk factors need to be established in different regions of China. ■ management teams. The members of the steering committee and collaborative group are listed in the online repository. 19  Индекс конкордантности Харрелла для модели ВОЗ составил 0,682 у женщин и 0,700 у мужчин, но различался в разных регионах. В результате применения модели ВОЗ был недооценен 10-летний риск развития сердечно-сосудистых заболеваний в большинстве регионов. После повторной калибровки в каждом регионе были улучшены показатели дискриминации и калибровки в общей популяции. Индекс конкордантности Харрелла увеличился с 0,674 до 0,749 у женщин и с 0,698 до 0,753 у мужчин. Отношение прогнозируемых и наблюдаемых случаев до и после повторной калибровки составило 0,189 и 1,027 у женщин и 0,543 и 1,089 у мужчин. Вывод Модель ВОЗ для стран Восточной Азии выявила умеренную дискриминацию в отношении сердечно-сосудистых заболеваний среди населения Китая и сопровождалась ограниченным прогнозированием риска развития сердечнососудистых заболеваний в различных регионах Китая. Повторная калибровка по различным регионам значительно улучшила показатели дискриминации и калибровки в общей популяции.

Rendimiento del modelo de predicción del riesgo de enfermedades cardiovasculares de la OMS en 10 regiones de China
Objetivo Validar el modelo de predicción del riesgo de enfermedades cardiovasculares sin pruebas de laboratorio de la Organización Mundial de la Salud (OMS) en regiones de China. Métodos Se realizó una validación externa del modelo de la OMS para Asia Oriental a partir del conjunto de datos del China Kadoorie Biobank, un estudio de cohortes en curso con 512 725 participantes seleccionados en 10 regiones de China entre 2004 y 2008. También se volvieron a calcular los parámetros de recalibración para el modelo de la OMS en cada región y se evaluó el rendimiento predictivo del modelo antes y después de la recalibración. Asimismo, se evaluó el rendimiento discriminatorio mediante el índice C de Harrell.